Extraction of Protein-Protein Interaction from Scientific Articles by Predicting Dominant Keywords

نویسندگان

  • Shun Koyabu
  • Thi Thanh Thuy Phan
  • Takenao Ohkawa
چکیده

For the automatic extraction of protein-protein interaction information from scientific articles, a machine learning approach is useful. The classifier is generated from training data represented using several features to decide whether a protein pair in each sentence has an interaction. Such a specific keyword that is directly related to interaction as "bind" or "interact" plays an important role for training classifiers. We call it a dominant keyword that affects the capability of the classifier. Although it is important to identify the dominant keywords, whether a keyword is dominant depends on the context in which it occurs. Therefore, we propose a method for predicting whether a keyword is dominant for each instance. In this method, a keyword that derives imbalanced classification results is tentatively assumed to be a dominant keyword initially. Then the classifiers are separately trained from the instance with and without the assumed dominant keywords. The validity of the assumed dominant keyword is evaluated based on the classification results of the generated classifiers. The assumption is updated by the evaluation result. Repeating this process increases the prediction accuracy of the dominant keyword. Our experimental results using five corpora show the effectiveness of our proposed method with dominant keyword prediction.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Construction of phosphorylation interaction networks by text mining of full-length articles using the eFIP system

Protein phosphorylation is a reversible post-translational modification where a protein kinase adds a phosphate group to a protein, potentially regulating its function, localization and/or activity. Phosphorylation can affect protein-protein interactions (PPIs), abolishing interaction with previous binding partners or enabling new interactions. Extracting phosphorylation information coupled wit...

متن کامل

Optimum conditions for protein extraction from tuna processing by-products using isoelectric solubilization and precipitation processes

The by-product from tuna processing is a potential source of edible protein. Therefore, it is very important to extract protein from such raw materials for human food. In this study the optimum pH for protein extraction from tuna by-products was optimized by using isoelectric solubilization and precipitation processes. The Response Surface Methodology (RSM) and the single factor model were used...

متن کامل

Optimum conditions for protein extraction from tuna processing by-products using isoelectric solubilization and precipitation processes

The by-product from tuna processing is a potential source of edible protein. Therefore, it is very important to extract protein from such raw materials for human food. In this study the optimum pH for protein extraction from tuna by-products was optimized by using isoelectric solubilization and precipitation processes. The Response Surface Methodology (RSM) and the single factor model were used...

متن کامل

PIE the search: searching PubMed literature for protein interaction information

MOTIVATION Finding protein-protein interaction (PPI) information from literature is challenging but an important issue. However, keyword search in PubMed(®) is often time consuming because it requires a series of actions that refine keywords and browse search results until it reaches a goal. Due to the rapid growth of biomedical literature, it has become more difficult for biologists and curato...

متن کامل

Predicting Protein-Protein Interactions from Protein Sequences Using Phylogenetic Profiles

In this study, a high accuracy protein-protein interaction prediction method is developed. The importance of the proposed method is that it only uses sequence information of proteins while predicting interaction. The method extracts phylogenetic profiles of proteins by using their sequence information. Combining the phylogenetic profiles of two proteins by checking existence of homologs in diff...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 2015  شماره 

صفحات  -

تاریخ انتشار 2015